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Abstract 

In  Probabilistic  Logic  Nilsson  uses  the  device  of  a  probability  dis¬ 
tribution  over  the  set  of  possible  worlds  to  assign  probabilities  to  the 
sentences  of  a  logical  language.  In  his  paper  Nilsson  concentrated  on 
inference  and  associated  computational  issues.  This  paper,  on  the 
other  hand,  examines  the  probabilistic  semantics  in  more  detail,  par¬ 
ticularly  for  the  case  of  first  order  languages,  and  attempts  to  explain 
some  of  the  features  and  limitations  of  this  form  of  probability  logic. 

It  is  pointed  out  that  the  device  of  assigning  probabilities  to  logical 
sentences  has  certain  expressive  limitations.  In  particular,  statisti¬ 
cal  assertions  are  not  easily  expressed  by  such  a  device.  This  leads 
to  certain  difficulties  with  attempts  to  give  probabilistic  semantics  to 
default  reasoning  using  probabilities  assigned  to  logical  sentences. 
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1  Introduction 


In  [1]  Nilsson  describes  a  method  of  assigning  probabilities  to  the  sentences 
of  a  logic  through  a  probability  distribution  over  the  set  of  possible  worlds. 

Each  possible  world  is  a  unique  and  consistent  set  of  truth  value  assignments 
for  all  of  the  sentences  in  the  logic.  Although  this  approach  is  unproblematic 
when  applied  to  propositional  languages,  certain  difficulties  arise  when  deal¬ 
ing  with  first  order  languages.  By  taking  a  different  tack  these  difficulties  can 
be  overcome,  and  indeed,  it  has  already  been  demonstrated  that  probabili¬ 
ties  can  be  coherently  assigned  to  the  sentences  of  any  first  order  language 
(Gaifman  [2],  Scott  and  Krauss  [3]). 

While  the  method  of  assigning  probabilities  to  logical  formulas  is  capable 
of  representing  probabilistic  degrees  of  belief,  it  is  incapable  of  effectively 
representing  statistical  assertions.  It  is  argued  that  many  types  of  defaults 
have  a  natural  statistical  interpretation,  but  cannot  be  represented  by  prob¬ 
abilities  over  logical  formulas,  because  of  this  limitation.  Some  authors  have 
attempted  to  represent  defaults  in  exactly  this  way  (Geffner  and  Pearl  [4], 

Pearl  [5]),  and  the  difficulties  with  their  systems  can  be  demonstrated. 

It  is  pointed  out  that  although  probabilities  over  logical  formulas  fails  to 
do  the  job,  statistical  assertions  can  be  efficiently  represented  in  other  types 
of  probability  logics,  logics  which  go  beyond  the  simple  extension  of  first 
order  logic  offered  by  Nilsson’s  probabilistic  logic. 

2  The  Propositional  Case 

A  natural  semantic  model  for  a  propositional  language  is  simply  a  subset 

of  the  set  of  atomic  symbols  (Chang  and  Keisler  [6]).  This  subset  is  the 

set  of  atomic  symbols  which  are  assigned  the  truth  value  true  (t).  Hence, 

in  the  propositional  case  Nilsson’s  concept  of  possible  worlds,  i.e.,  a  set  of 

consistent  truth  value  assignments,  has  a  natural  correspondence  with  the 

set  of  semantic  models.  Each  possible  world  is  completely  determined  by 

its  truth  value  assignments  to  the  atomic  symbols  of  the  language,  and  the  fox-  / 

assignments  to  the  atomic  symbols  can  be  viewed  «is  being  the  characteristic  J 

function  of  a  semantic  model  (with  t  =  1,  f  =  0).  rrto 

For  example,  in  a  propositional  language  with  two  atomic  symbols  {A,  B}  t 

there  are  four  possible  worlds  with  corresponding  semantic  models  (<t  is  used  ..r  ^-  iot, - 

;  »y _ 
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to  indicate  the  truth  evaluation  function). 

1.  {A‘'  =  =  t}  or  {A,  B}. 

2.  {A^  =t,B^  =  f}  or  {>!}. 

3.  =  t}  or  {5}. 

4.  {A^  =  f,  jB''  =  f}  or  {},  i.e.,  the  empty  set. 

An  equivalent  way  of  looking  at  things,  which  will  turn  out  to  be  more 
useful  when  we  move  to  first  order  languages,  is  to  consider  the  atoms^  of  the 
language.  When  the  language  has  a  finite  number  of  atomic  symbols  each 
possible  world  can  be  represented  as  a  single  sentence:  a  sentence  formed  by 
conjoining  each  atomic  symbol  or  its  negation,  such  a  sentence  is  called  an 
atom.  Corresponding  to  the  four  cases  above  we  have  the  four  atoms  A  A  B, 
A  A  ->5,  -'A  A  B,  and  -^A  A  —>B. 

Given  a  probability  distribution  over  the  set  of  possible  worlds  it  is  pos¬ 
sible  to  assign  a  probability  to  each  sentence  of  the  language.  Each  sentence 
is  given  a  probability  equal  to  the  probability  of  the  set  of  worlds  in  which 
it  is  true. 

Equivalently,  a  probability  distribution  can  be  placed  over  the  sentences  of 
the  logic,  more  precisely  over  the  Lindenbaum-Tarski  algebra  of  the  language. 
This  algebra  is  generated  by  grouping  the  sentences  into  equivalence  claisses. 
Two  sentences,  a  and  (3^  are  in  the  same  equivalence  cleiss  iff  bo  ct  P, 
where  bg  indicates  deducible  from  the  propositional  axioms. 

This  technique  is  not  limited  to  languages  with  a  finite  number  of  atomic 
symbols.  When  the  language  is  finite,  however,  the  atoms  will  V  e  sentences 
of  the  language,  and  the  probability  distribution  will  be  comT..ietely  specified 
by  the  probabilities  of  the  atoms  (the  e-clzisses  of).  Any  sentence  can  be 
written  as  a  disjunction  of  a  unique  set  of  atoms,  and  its  probability  will 
be  the  sum  of  the  probabilities  of  these  atoms.  For  example,  if  we  specify 
the  probabilities  {A  A  B  =  .5,  A  A  ->B  =  .1,  -lA  A  B  =  .2,  -'A  A  -<B  =  .2}, 
then  the  sentence  Ay  B  will  have  probability  0.8  eis  it  can  be  written  as 
(A  A  5)  V  (A  A  ^B)  V  (--A  A  B). 

‘An  atom  in  a  Boolean  algebra  is  a  minimal  non-zero  element  (Bell  and  Machover,  [7]). 
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First  Order  Languages 


When  the  move  is  made  to  first  order  languages  certain  problems  arise.  The 
first  problem  is  that  we  lose  the  nice  correspondence  between  possible  worlds 
and  semantic  models.  The  normal  semantic  model  for  a  first  order  language  is 
considerably  more  complex  than  the  model  for  a  propositional  language,  and 
the  truth  value  of  the  sentences  in  a  first  order  language  is  determined  both 
by  the  model  and  by  an  interpretation  (i.e.,  the  mapping  from  the  symbols 
to  the  semantic  entities).  For  a  given  truth  value  assignment  to  the  sentences 
(possible  world)  there  will  be  many  different  (in  fact  an  infinite  number)  of 
model/interpretation  pairs  which  will  yield  the  same  truth  values.  Hence, 
the  semantic  structure  of  the  possible  worlds  is  unclear. 

Another  difficulty,  which  Nilsson  is  aware  of,  is  that  Nilsson’s  techniques 
depend  on  being  able  to  generate  consistent  truth  value  assignments  for  a  set 
of  sentences.  These  are  used  as  0/1  column  vectors  in  his  V  matrix.  This 
technique  is  limited  to  languages  in  which  the  consistency  of  a  finite  set  of 
sentences  can  be  established.  The  consistency  of  a  set  of  first  order  sentences 
is  not  decidable,  except  in  special  cases  (see  Ackermann  [8]  for  an  interesting 
survey). 

These  difficulties  can  be  avoided  if  instead  of  probability  distributions  over 
possible  worlds  we  consider  probability  distributions  over  the  Lindenbaum- 
Tarski  (L-T)  algebra  of  the  language.  It  has  already  been  demonstrated  by 
Gaifman  [2]  that  a  probability  measure  can  be  defined  over  the  L-T  algebra  of 
sentences  of  a  first  order  language.  Every  sentence  in  the  language  will  have  a 
probability  equal  to  the  probability  of  its  equivalence  class,  and  furthermore, 
the  probabilities  will  satisfy  the  condition 

If  I — i(q  a  13)  then  p[a  V  /?]  =  p[a]  -t-  p[^], 

where  h  indicates  deducible  from  the  first  order  axioms.  This  means  that  the 
probability  measure  preserves  the  partial  order  of  the  algebra.  In  this  partial 
order  we  have  q  <  /?  iff  a  A  /?  =  q;  hence,  p[0]  =  p[/3  A  q]  -f-  p[/0  A  -iq]  (by 
the  above  condition),  and  p[0]  =  p[a|  -f-  p{$  A  -la]  >  p[q]-  Under  this  partial 
order  the  conjunction  and  disjunction  operators  generate  the  greatest  lower 
bound  (infimum)  and  legist  upper  bound  (supremum). 
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To  examine  what  happens  to  quantified  sentences  under  such  a  probability 
measure  it  is  sufficient  to  note  that  for  L-T  algebras  w6  have  that 

(*) 

|3xa|  =  V  l<»(x/i)l. 

«€T 

where  |  o  |  indicates  the  e-class  of  the  formula,  and  T  is  the  set  of  terms  of 
the  language.  What  this  means  is  that  each  existentially  quantified  sentence 
is  equal  to  the  supremum  of  all  its  instantiations.  This  implies  that  the 
probability  of  any  existentially  quantified  sentence  must  be  greater  than  or 
equal  to  the  probability  of  any  of  its  instances.  Similarly,  the  probability  of 
any  universally  quantified  sentence  must  be  less  than  or  equal  to  any  of  its 
instances. 

This  interpretation  also  makes  sense  in  terms  of  Nilsson’s  possible  worlds. 
In  any  possible  world  the  existential  must  be  true  if  any  of  its  instantiations 
are.  Hence,  the  set  of  possible  worlds  in  which  the  existential  is  true  includes 
the  set  of  possible  worlds  in  which  any  instantiation  is  true,  and  the  existen¬ 
tial  must  have  a  probability  greater  than  or  equal  to  the  probability  of  any 
of  its  instances. 

4  The  Representation  of  Statistical  Knowl¬ 
edge 

Probabilities  attached  to  logical  sentences  can  be  interpreted  as  degrees  of 
belief  in  those  sentences.  Instead  of  either  asserting  a  sentence  or  its  negation, 
as  in  ordinary  logic,  one  can  attach  some  intermediate  degree  to  it,  a  degree  of 
belief.  So,  for  example,  one  could  represent  a  degree  of  belief  of  greater  than 
0.9  in  the  assertion  “Tweety  can  fly”  by  assigning  the  sentence  Fly{T weety)  a 
probability  >  0.9.  However,  it  is  not  easy  to  represent  statistical  information, 
for  example,  the  assertion  “More  than  90%  of  all  birds  fly.”* 

is  the  case  that  first  order  logic  is  in  some  sense  universally  expressive.  That  is, 
set  theory  can  be  constructed  in  first  order  logic,  and  thus,  sufficient  mathematics  can  be 
built  up  inside  the  language  to  represent  statements  of  this  form.  This  is  not,  however,  an 
efficient  representation,  nor  is  there  any  direct  reflection  in  the  semantics  of  the  statistical 
information.  Since  probabilities  attached  to  logical  formulas  generalizes  ordinary  logic. 


First,  propositional  languages  do  not  seem  to  possess  sufficient  power  to 
represent  these  kinds  of  statements.  This  particular  statistical  statement 
is  an  assertion  which  indicates  some  relationship  between  the  properties  of 
being  a  bird  and  being  able  to  fly,  but  it  is  not  an  assertion  about  any 
particular  bird.  This  indicates  that  some  sort  of  variable  is  required,  i.e., 
this  statement  cannot  be  encode  as  a  statement  about  any  particular  bird. 
Propositional  languages  do  not  have  variables,  and  so  are  inadequate  for  this 
task  even  when  they  are  generalized  to  take  on  probabilities  instead  of  just 
1/0  truth  values. 

When  we  move  to  first  order  languages  we  do  get  access  to  variables, 
variables  which  can  range  over  the  set  of  individuals.  A  seemingly  reasonable 
way  to  represent  this  statement  is  to  consider  the  probabilistic  generalization 
of  the  universal  sentence  'ixBird{x)  — »  Fly{x).  The  universal  in  1/0  first 
order  logic  says  that  all  birds  fly,  so  if  we  attach  a  probability  of  >  0.9  perhaps 
we  will  get  what  we  need.  Unfortunately,  this  does  not  work.  If  there  is  single 
bird  who  is  thought  to  be  unable  to  fly,  this  universal  will  be  forced  to  have 
a  probability  close  to  zero.  That  is,  the  probability  of  this  universal  must  be 
1  —  p[3xBird{x)  A  -'Fly{x)].  Hence,  if  one  believes  to  degree  greater  that  0.1 
that  a  non-flying  bird  exists,  then  the  probability  of  the  universal  must  be 
<  .9. 

Since  universal  quantification  or  its  dual  existentiad  quantification  are  the 
only  ones  available  in  a  first  order  language,  it  does  not  seem  that  moving 
to  first  order  languages  allows  us  to  represent  statistical  aissertions.  There 
is,  however,  one  more  avenue  available:  conditional  probabilities.  We  have 
probabilities  attached  to  sentences  hence  with  two  sentences  we  can  form 
conditional  probabilities.  It  has  been  suggested  (Cheeseman  [9])  that  meta- 
quantified  statements  of  the  following  form  can  be  used  to  capture  statistical 
statements,  in  particular  for  the  statement  about  birds: 

V(i)p[Ffy(i)|^trd(x)]  >  0.9. 

In  this  assertion  quantification  is  occurring  at  a  level  outside  of  the  language 
(at  a  meta-level):  “p[F/y(z)lBtrd(z)]”  is  not  a  formula  of  the  first  order 
language.  This  statement  is  intended  to  assert  that  for  every  term,  t,  in  the 
object  language  the  conditional  probability  of  the  sentence  Fly{xft),  with 
the  variable  x  substituted  by  the  term  t,  given  Bird{xft)  is  >  0.9. 

statistical  information  could  be  represented  in  this  manner;  however,  I  am  concerned  here 
with  efficient  representations. 
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However,  this  formulation  also  falls  prey  to  any  know  exception.  Say 
that  there  is  some  individual,  denoted  by  the  constant  c,  who  is  thought  to 
be  a  bird,  i.e.,  p[Bird{c)]  is  high,  and  for  some  reason  or  the  other  is  also 
believed  to  be  unable  to  fly,  i.e.,  p[Fly{c)]  is  low,  then  clearly  this  statement 
cannot  be  true  for  the  instance  when  x  is  c;  hence,  the  meta-level  universal 
statement  cannot  be  true.  It  is  important  to  note  that  it  does  not  matter 
what  other  things  are  known  about  the  individual  c.  For  example,  it  does 
not  matter  if  c  is  known  to  be  an  ostrich;  it  is  still  the  case  that  the  condi¬ 
tional  probability  of  Fly(c)  given  Bird{c)  will  not  be  >  0.9.  Hence,  there  is 
no  way  that  the  statistical  statement  “More  than  90%  of  all  birds  fly”  can 
be  represented  by  the  assertion  that  the  conditional  probability  is  greater 
than  0.9  for  all  substitutions  of  x:  this  assertion  will  be  false  for  certain  sub¬ 
stitutions.  The  problem  here  is  that  the  statistical  statement  implies  that 
p[Fly{x)\Bird{x)\  >  0.9  for  a  random  i,  but  a  universally  quantified  x  is  not 
the  same  as  a  random  variable  x;  furthermore,  the  simple  device  of  assigning 
probabilities  to  sentences  of  a  logical  language  does  not  give  you  access  to 
random  variables.  This  point  has  also  been  raised  by  Schubert  [10]. 

5  The  Representation  of  Defaults 

There  are  many  different  defaults  which  have  a  natural  statistical  justifi¬ 
cation,  the  famous  example  of  “Birds  fly”  being  one  of  them.  A  natural 
reason  for  assuming  by  default  that  a  particular  bird  can  fly  is  simply  the 
fact  that,  in  a  statistical  sense,  most  birds  do  fly.  This  is  not  to  say  that  all 
defaults  have  a  statistical  interpretation:  there  are  many  different  notions  of 
typicality  which  do  not  have  a  straightforward  statistical  interpretation,  e.g., 
“Dogs  give  live  birth”  (Carlson  [11],  Nutter  [12],  also  see  Brachman  [13]  for 
a  discussion  of  different  notions  of  typicality). 

Since  probabilities  attached  to  the  sentences  of  a  logic  do  not  offer  any 
easy  way  of  representing  statistical  assertions,  it  is  not  surprising  that  at¬ 
tempts  to  use  this  formalism  to  give  meaning  to  defaults  leads  to  certain 
difficulties. 

Recently  Geffner  and  Pearl  [4]  have  proposed  giving  semantics  to  de¬ 
faults  through  meta-quantified  conditional  probability  statements  (also  Pearl 
[5]^).  For  example,  the  default  “Birds  fly”  is  given  meaning  through 

^Pearl  uses  a  slightly  different  notion  of  probabilities  within  c  of  one.  The  technical 
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the  meta-quantified  statement  Var  p[Fly{x)\Bird{x)]  ss  1  In  order  to  al¬ 
low  penguins  to  be  non-flying  birds  they  have  the  separate  default  rule: 
Vi  p[-tF/j/(i)|Pen^um(i)]  «  1.  They  also  have  universal  statements  like 
Vi  Penguin{x)  -*  Bird{x).  The  probability  of  these  universals  is  one;  thus, 
as  discussed  above,  every  instantiation  must  also  have  probability  one. 

To  examine  the  difficulties  which  arise  from  this  approach  consider  the 
following  example.  Say  that  we  have  a  logical  language  with  the  predicates 
Bird,  Fly,  auid  Penguin,  some  set  of  terms  {<,),  and  a  probability  distribu¬ 
tion  over  the  sentences  of  the  language  which  satisfies  the  default  rules,  i.e., 
forall  terms  p[Fly{ti)\Bird{ti)\  ~  1  and  p[-~<Fly{ti)\Penguin{ti)]  m  1,  and 
in  which  the  universal  ViPenpitin(i)  — >  Bird{x),  has  probability  one.  Some 
simple  facts  which  follow  from  the  universal  having  probability  one  are  that 
forall  terms  t,,  p[Bird{ti)]  >  p[Penguin{ti)],  and  p[Bird{ti)  /\  Penguin(ti)]  = 
p[Penguin{ti)]).  Consider  the  following  derivation: 

1  p[Fly{ti)\Bird{ti)] 

_  p[Fly(ti)  A  Bird{ti)  A  ^PengjU)]  p[P/y(t,)  A  Bird{ti)  A  Peng{ti)] 
p[Bird(ti)]  p[Bird{ti)] 

^  p[Fly{ti)  A  Birdjti)  A  ->Peng{U)]  ^  p[Fly{ti)  A  Birdjti)  A  PengjU)] 
~  p[Penpum(t,)]  p[Pen5rum(f,)] 

^  p[-^Penguin{ti)]  ^  p{Fly{ti)  A  PenguinjU)] 

~  p[Penguin{ti)\  p[Penguin{t,)] 

=  Odds[->Penguin{ti)]  -|-  ssO 

That  is,  the  constraints  imply  that  for  any  term  ti  the  odds  that  ^  is  not  a 
penguin  must  be  at  least  «!;  hence  p[Penguin{ti)]  cannot  be  much  greater 
than  0.5.  Since  «  0.5  is  an  upper  bound  on  the  probability  of  all  of  these 
sentences,  it  must  also  be  the  case  that  it  is  an  upper  bound  on  the  probability 
of  the  sentence  3x  Penguin{x),  by  equation  ★. 

That  is,  if  we  accept  the  defaults  we  are  must  reject  any  sort  of  high  level 
of  belief  in  the  existence  of  penguins. 

This  problem  is  similar  to  the  problem  discussed  in  the  previous  section; 
a  universally  quantified  variable  is  not  the  same  as  a  random  variable,  and 
cannot  be  used  to  encode  a  random  variable. 

differences  between  this  approach  and  that  of  Geffner  and  Pearl  do  not  make  any  difference 
to  the  following  discussion;  the  anomalies  presented  also  appear  in  Pearl’s  system. 
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6  Conclusions 

It  ha^  been  demonstrated  that  although  probabilities  can  be  assigned  to  the 
sentences  of  any  first  order  language,  the  resulting  probability  logics  are  not 
powerful  enough  to  efficiently  represent  statistical  assertions.  It  hais  also  been 
demonstrated  that  attempts  to  give  defaults  a  probabilistic  semantics  using 
these  probability  logics  leads  to  certain  anomalies. 

Statistical  facts,  it  has  been  argued,  give  a  natural  justification  to  many 
default  inferences.  This  implies  that  probabilities  might  still  be  useful  for 
giving  semantics  to  default  rules  and  a  justification  to  default  inferences.  For 
example,  the  default  rule  “Birds  fly”  could  be  represented  as  a  statistical 
assertion  that  some  large  percentage  of  birds  fly,  and  the  default  inference 
“Tweety  flies”  could  be  given  the  justification  that  Tweety  probably  does  fly 
if  to  the  best  of  our  knowledge  Tweety  was  a  randomly  selected  bird. 

Probability  logics  which  accomplish  this  have  already  been  developed 
(Bacchus  [14],  Kyburg  [15]),  but  these  logics  go  beyond  the  simple  device  of 
assigning  probabilities  to  the  sentences  of  a  logical  language.  Bacchus  uses 
a  logic  which  has  a  probability  distribution  over  the  domain  of  discourse, 
this  logic  is  capable  of  expressing  statistical  information,  and  possesses  a 
sound  and  complete  proof  theory  capable  of  reasoning  with  statistical  facts. 
Default  inferences  are  handled  by  an  inductive  mechanism  which  forms  de¬ 
feasible  conclusions,  conclusions  which  can  be  defeated  by  new  information. 
Kyburg  uses  an  object  language/meta-language  formalism,  and  has  explored 
the  inductive  formation  of  defeasible  conclusions  in  greater  detail  [16]. 
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